Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18751][Core]Fix deadlock when SparkContext.stop is called in Utils.tryOrStopSparkContext #16178

Closed
wants to merge 3 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Dec 6, 2016

What changes were proposed in this pull request?

When SparkContext.stop is called in Utils.tryOrStopSparkContext (the following three places), it will cause deadlock because the stop method needs to wait for the thread running stop to exit.

  • ContextCleaner.keepCleaning
  • LiveListenerBus.listenerThread.run
  • TaskSchedulerImpl.start

This PR adds SparkContext.stopInNewThread and uses it to eliminate the potential deadlock. I also removed my changes in #15775 since they are not necessary now.

How was this patch tested?

Jenkins

@zsxwing zsxwing changed the title Fix deadlock when SparkContext.stop is called in Utils.tryOrStopSparkContext [SPARK-18751][Core]Fix deadlock when SparkContext.stop is called in Utils.tryOrStopSparkContext Dec 6, 2016
@zsxwing
Copy link
Member Author

zsxwing commented Dec 6, 2016

cc @rxin

_stop()
}
override def run(): Unit = {
SparkContext.this.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this ever throw an exception? Should we register an UncaughtExceptionHandler or try catch with logging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this ever throw an exception? Should we register an UncaughtExceptionHandler or try catch with logging?

This happens in the driver, so we cannot use SparkUncaughtExceptionHandler to catch the error. The error will be sent to the user's UncaughtExceptionHandler if specified or just print to stderr.

/**
* Shut down the SparkContext.
*/
def stop() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a proper signature (...: Unit = {)

@SparkQA
Copy link

SparkQA commented Dec 7, 2016

Test build #69750 has finished for PR 16178 at commit 7131a96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2016

Test build #69751 has finished for PR 16178 at commit 2cd4282.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 8, 2016

Test build #69821 has finished for PR 16178 at commit ae3d42c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Dec 8, 2016

LGTM

@zsxwing
Copy link
Member Author

zsxwing commented Dec 8, 2016

Thanks! Merging to master and 2.1.

asfgit pushed a commit that referenced this pull request Dec 8, 2016
…Utils.tryOrStopSparkContext

## What changes were proposed in this pull request?

When `SparkContext.stop` is called in `Utils.tryOrStopSparkContext` (the following three places), it will cause deadlock because the `stop` method needs to wait for the thread running `stop` to exit.

- ContextCleaner.keepCleaning
- LiveListenerBus.listenerThread.run
- TaskSchedulerImpl.start

This PR adds `SparkContext.stopInNewThread` and uses it to eliminate the potential deadlock. I also removed my changes in #15775 since they are not necessary now.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <[email protected]>

Closes #16178 from zsxwing/fix-stop-deadlock.

(cherry picked from commit 26432df)
Signed-off-by: Shixiong Zhu <[email protected]>
@asfgit asfgit closed this in 26432df Dec 8, 2016
@zsxwing zsxwing deleted the fix-stop-deadlock branch December 8, 2016 20:08
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
…Utils.tryOrStopSparkContext

## What changes were proposed in this pull request?

When `SparkContext.stop` is called in `Utils.tryOrStopSparkContext` (the following three places), it will cause deadlock because the `stop` method needs to wait for the thread running `stop` to exit.

- ContextCleaner.keepCleaning
- LiveListenerBus.listenerThread.run
- TaskSchedulerImpl.start

This PR adds `SparkContext.stopInNewThread` and uses it to eliminate the potential deadlock. I also removed my changes in apache#15775 since they are not necessary now.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <[email protected]>

Closes apache#16178 from zsxwing/fix-stop-deadlock.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…Utils.tryOrStopSparkContext

## What changes were proposed in this pull request?

When `SparkContext.stop` is called in `Utils.tryOrStopSparkContext` (the following three places), it will cause deadlock because the `stop` method needs to wait for the thread running `stop` to exit.

- ContextCleaner.keepCleaning
- LiveListenerBus.listenerThread.run
- TaskSchedulerImpl.start

This PR adds `SparkContext.stopInNewThread` and uses it to eliminate the potential deadlock. I also removed my changes in apache#15775 since they are not necessary now.

## How was this patch tested?

Jenkins

Author: Shixiong Zhu <[email protected]>

Closes apache#16178 from zsxwing/fix-stop-deadlock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants